Chimeric enzymes with improved cellulase activities

ABSTRACT

Nucleic acid molecules encoding chimeric cellulase polypeptides that exhibit improved cellulase activities are disclosed herein. The chimeric cellulase polypeptides encoded by these nucleic acids and methods to produce the cellulases are also described, along with methods of using chimeric cellulases for the conversion of cellulose to sugars such as glucose.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No. 61/671,454, filed Jul. 13, 2012, the contents of which are incorporated by reference in their entirety.

CONTRACTUAL ORIGIN

The United States Government has rights in this invention under Contract No. DE-AC36-08GO28308 between the United States Department of Energy and Alliance for Sustainable Energy, LLC, the Manager and Operator of the National Renewable Energy Laboratory.

REFERENCE TO SEQUENCE LISTING

This application contains a Sequence Listing submitted as an electronic text file entitled “12-33_ST25.txt,” having a size in bytes of 100 kb and created on Jul. 12, 2013. Pursuant to 37 CFR §1.52(e)(5), the information contained in the above electronic file is hereby incorporated by reference in its entirety.

BACKGROUND

Biofuel is a promising renewable energy technology in part because of the large amount and low cost of its biomass feedstock. Efficient action of cellulases to release fermentable sugars from biomass cellulose is an important step in making this conversion economically viable. The major strategies to improve cellulase activity include rational design and directed evolution. Rational design is based on knowledge of the structure of cellulases, and presumes a detailed understanding of the relationship between enzyme structure and its function, but directed evolution does not require understanding of structure and function.

Clostridium thermocellum is an anaerobic, thermophilic, cellulolytic, and ethanogenic bacterium that shows potential for use in bioenergy production because it is capable of directly converting cellulose into ethanol. Degradation of cellulosic materials by Clostridium thermocellum is carried out by a large extracellular cellulase system called the cellulosome, a complicated protein complex consisting of nearly 20 different catalytic subunits. One feature of the cellulosome is the nonhydrolytic scaffoldin subunit that integrates the various catalytic subunits into the complex via interactions between its repetitive cohesin domains and complementary dockerin domains on the catalytic subunits. Several cellulolytic bacteria and fungi are known to produce extracellular multienzyme complexes similar to the cellulosome.

The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the drawings.

SUMMARY

The following embodiments and aspects thereof are described and illustrated in conjunction with systems, tools and methods that are meant to be exemplary and illustrative, not limiting in scope. In various embodiments, one or more of the above-described problems have been reduced or eliminated, while other embodiments are directed to other improvements.

Exemplary embodiments provide isolated nucleic acid molecules that encode chimeric CbhA polypeptides that have cellulase activities greater than wild-type CbhA polypeptides. In certain embodiments, the chimeric CbhA polypeptides comprise domains from Clostridium thermocellum CbhA and Caldicellulosiruptor bescii CelA polypeptides, such as the linker domain from the Caldicellulosiruptor bescii CelA polypeptide.

Additional embodiments provide chimeric CbhA polypeptides that have cellulase activities at least 2-fold greater than wild-type CbhA polypeptides and methods for degrading cellulose or lignocellulosic biomass by contacting a cellulose containing material or lignocellulosic biomass with the isolated chimeric CbhA polypeptides.

In addition to the exemplary aspects and embodiments described above, further aspects and embodiments will become apparent by reference to the drawings and by study of the following descriptions.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments are illustrated in referenced figures of the drawings. It is intended that the embodiments and figures disclosed herein are to be considered illustrative rather than limiting.

FIG. 1 shows the nucleic acid sequence for wild-type CbhA from Clostridium thermocellum (SEQ ID NO:1).

FIG. 2 shows the amino acid sequence for wild-type CbhA from Clostridium thermocellum (SEQ ID NO:2). The putative signal sequence is indicated in bold and the linker domain is underlined.

FIG. 3 shows the nucleic acid sequence for wild-type CelA from Caldicellulosiruptor bescii (SEQ ID NO:3).

FIG. 4 shows the amino acid sequence for wild-type CelA from Caldicellulosiruptor bescii (SEQ ID NO:4). The linker domain is underlined.

FIG. 5 shows the nucleic acid (A; SEQ ID NO:5) amino acid (B; SEQ ID NO:6) sequence for a chimeric CbhA polypeptide wherein the linker domain from CbhA has been replaced with the linker from Caldicellulosiruptor bescii CelA. The linker domain is underlined in each sequence and the nucleic acid (C; SEQ ID NO:7) and amino acid (D; SEQ ID NO:8) sequence of the linker domain are also provided.

FIG. 6A shows details of three exemplary linkers for the construction of chimeric enzymes. SP, signal peptide; Coh, type-II cohesin; Doc 1, type-I dockerin; Doc 2, type-II dockerin; GH, glycosyl hydrolase family; CBM, family of carbohydrate-binding modules; Ig, immunoglobulin-like fold; X, family of unknown function. FIG. 6B illustrates the domains of the three proteins that contain the linkers.

FIG. 7 shows digestion curves for chimeric and wild-type CbhA enzymes on pretreated corn stover substrates.

FIG. 8 shows the contribution of linkers to activity of CbhA and its chimeras. A, activity of cellulosomal cellulase was assayed directly; B, cellulosomal cellulase was combined with monoscaffoldin of cohesin2-CBM3a to generate minicellulosome first, and then its activity was assayed.

FIG. 9 shows the contribution of linker3 to activity of cellulosomal multifunctional cellulase. Cellulosomal cellulases were combined with monoscaffoldin of Coh-CBM3a to form minicellulosomes first, and then activity of these minicellulosomes was assayed.

FIG. 10 shows the contribution of linker3 to activity of non-cellulosomal multifunctional cellulase.

FIG. 11A-C shows the activity of multifunctional cellulase and its intra-molecular synergy. Cellulosomal cellulases were combined with monoscaffoldin of Coh-CBM3a to form minicellulosomes first, and then activity of these minicellulosomes was assayed.

DETAILED DESCRIPTION

Nucleic acid molecules encoding chimeric cellulase polypeptides that exhibit improved cellulase activities are disclosed herein. The chimeric cellulase polypeptides encoded by these nucleic acids and methods to produce the cellulases are also described, along with methods of using chimeric cellulases for the conversion of cellulose to sugars such as glucose.

Despite efforts to engineer cellulases with significantly improved activities, few successes have been demonstrated. The results of past efforts have been summarized, for example, in a review article by Wilson (Curr. Opin. Biotechnol. 20:295-299 (2009) (noting that “[a]t this time there are no published reports of engineered cellulases with major (greater than 1.5-fold) increases in activity on crystalline cellulose.”). Prior cellulase engineering has focused upon screening small sets of rationally guided mutations for higher thermal stability and subsequent modest gains in activity at higher conversion temperatures. Significant activity improvement in processive cellulase enzymes on realistic substrates at industrially relevant enzyme loadings and substrate conversion levels remains to be demonstrated.

Disclosed herein are methods for dramatically improving the activity of cellulosomal cellulases (e.g., C. thermocellum CbhA) by exchanging domains with other cellulases to form chimeric polypeptides. In particular, exchanging linker domains in the chimeric polypeptides results in a surprising increase in cellulase activity when compared with the wild-type polypeptides.

CbhA is one of the key cellulosomal cellulases in the C. thermocellum cellulosome system. The nucleic acid (SEQ ID NO:1) and amino acid (SEQ ID NO:2) sequences for wild type CbhA are depicted in FIGS. 1 and 2, respectively.

The improvements in activity exhibited in chimeras by substituting the long linker from C. bescii CelA demonstrates the potential of this new approach exploiting modular cooperation to enhance activity of a large cellulosomal cellulase, but also supports applying this new approach to improvement of other multimodular cellulases. The high activity and high intramolecular synergy displayed by the chimeric cellulases also demonstrates the promise of enhancing the activity of cellulases (and possibly also that of metabolic enzymes related to biomass conversion) by linking catalytic domains not combined in nature.

The activity enhancement in the chimeras of CbhA and in other artificial multifunctional cellulases may reflect the differing abilities of the linkers to provide the spacing and the flexibility that allow individual modules of the multifunctional peptide to interact productively with the cellulose surface.

Various linkers have been found in cellulases and cellulosomal components such as scaffoldins, the major composition being “PT” or “G” repeats based on their amino acid sequences. It has been suggested that these linkers do not typically form defined structures. Their function may include increasing the solubility of peptides due to glycosylation on the amino acid of “T”, and making the peptide flexible. The supramolecular cellulosome protein complex keeps many cellulases together, and while it seems that so highly organized a complex would limit the mobility of tethered cellulases, the cellulosome has extremely high activity on insoluble and recalcitrant crystalline cellulose. Having linkers between modules and between peptides may make the attached catalytic modules flexible, resulting in greater mobility of cellulases on insoluble substrates.

What is referred to herein as the “linker domain” of CbhA is two consecutive X1 domains of CbhA. The X1 module may have disruption function in the digestion of crystalline cellulose, but these two X1 domains may also form a spacer or linker in the large peptide. Substituting a large linker of C. bescii CelA for two X1 domains in CbhA resulted in a chimera that was more stable when expressed in E. coli, and exhibited a higher activity than that of wild-type CbhA.

Compared to linkers with “PT” or “G” compositions, linker 3 showed normal amino acid composition, and was very stable in CbhA or truncated CbhA, as well as in artificial multifunctional cellulases. This linker is not easily digested or broken during its overexpression in E. colit, or in storage buffers. Therefore, it is expected that this linker could be used widely in the construction of multifunctional cellulases, and furthermore it could be used for construction of multifunctional metabolic enzymes to this module has been used to construct multifunctional cellulases, and obtain high intra-molecular synergy.

As used herein, the terms “chimeric polypeptide” or “chimera” refer to a polypeptide composed of parts of different wild-type polypeptides and typically composed of discrete functional domains from different polypeptides. For example, a chimeric CbhA polypeptide may comprise a linker domain from a distinct polypeptide. For exemplary purposes, the present disclosure is directed to chimeric C. thermocellum CbhA polypeptides comprising a linker domain from the CelA polypeptide of C. bescii, such as those depicted in FIG. 5 and represented by SEQ ID NOS:5 and 6. However, the concepts disclosed herein encompass chimeras of CbhA or CelA polypeptides from other bacteria that exhibit enhanced enzymatic activities. The amino acid sequences for the wild-type C. thermocellum CbhA (SEQ ID NO:2) and C. bescii CelA (SEQ ID NO:4) polypeptides and the linker domains of each are illustrated in FIGS. 2 and 4, respectively.

In some embodiments, the chimeras may further comprise one or more binding adaptors bound to the chimeric polypeptide. Binding adaptors may comprise a fusion of a cohesion molecule with a carbohydrate binding module (CBM). One exemplary binding adaptor comprises a fusion of cohesin 2 (the second of the nine Type-I cohesins in the C. thermocellum scaffoldin protein CipA) with a with a CBM3a module. However, other cohesions and CBMs are also suitable.

The chimeric CbhA polypeptides exhibit surprisingly improved cellulase activities when compared to the wild-type CbhA polypeptides. The term “improved cellulase activity” refers to an increased rate of hydrolysis of a cellulosic substrate. Relative activities for chimeric and wild-type CbhA polypeptides can be determined using conventional assays, including those discussed in the Examples below. Additional assays suitable for determining cellulase activity include hydrolysis assays on industrially relevant cellulose-containing substrates such as pretreated corn stover. Hydrolysis assays on crystalline cellulose or amorphous cellulose or on small molecule fluorescent reporters may also be used to determine cellulase activity. In certain embodiments, cellulase activity is expressed as the amount of time or enzyme concentration needed to reach a certain percentage (e.g., 30%) of cellulose conversion to sugars. For example, as shown in FIG. 7, the digestion times to achieve 30% conversion of a pretreated corn stover cellulose substrate are approximately 46.4 hours for the chimeric CbhA and 98.4 hours for the wild-type CbhA. In this assay, the chimeric CbhA exhibits a 2.12-fold greater cellulase activity than the wild-type CbhA.

In contrast to the results of previous attempts to engineer cellulases, the chimeric CbhA polypeptides herein exhibit cellulase activities that are at least 1.5-fold greater than the wild-type CbhA polypeptide and that can reach at least 3-fold greater activity. In certain embodiments, the chimeric CbhA polypeptides exhibit cellulase activities that are at least 1.1-, 1.2-, 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2-, 2.1-, 2.2-, 2.3-, 2.4-, 2.5-, 2.6-, 2.7-, 2.8-, 2.9-, 3-, 3.1-, 3.2-, 3.3-, 3.4-, or 3.5-fold greater than the wild-type CbhA polypeptide.

“Nucleic acid” or “polynucleotide” as used herein refers to purine- and pyrimidine-containing polymers of any length, either polyribonucleotides or polydeoxyribonucleotide or mixed polyribo-polydeoxyribonucleotides. This includes single- and double-stranded molecules (i.e., DNA-DNA, DNA-RNA and RNA-RNA hybrids) as well as “protein nucleic acids” (PNA) formed by conjugating bases to an amino acid backbone. This also includes nucleic acids containing modified bases.

Nucleic acids referred to herein as “isolated” are nucleic acids that have been removed from their natural milieu or separated away from the nucleic acids of the genomic DNA or cellular RNA of their source of origin (e.g., as it exists in cells or in a mixture of nucleic acids such as a library), and may have undergone further processing. Isolated nucleic acids include nucleic acids obtained by methods described herein, similar methods or other suitable methods, including essentially pure nucleic acids, nucleic acids produced by chemical synthesis, by combinations of biological and chemical methods, and recombinant nucleic acids that are isolated.

Nucleic acids referred to herein as “recombinant” are nucleic acids which have been produced by recombinant DNA methodology, including those nucleic acids that are generated by procedures that rely upon a method of artificial replication, such as the polymerase chain reaction (PCR) and/or cloning into a vector using restriction enzymes. Recombinant nucleic acids also include those that result from recombination events that occur through the natural mechanisms of cells, but are selected for after the introduction to the cells of nucleic acids designed to allow or make probable a desired recombination event. Portions of isolated nucleic acids that code for polypeptides having a certain function can be identified and isolated by, for example, the method disclosed in U.S. Pat. No. 4,952,501.

An isolated nucleic acid molecule can be isolated from its natural source or produced using recombinant DNA technology (e.g., polymerase chain reaction (PCR) amplification, cloning) or chemical synthesis. Isolated nucleic acid molecules can include, for example, genes, natural allelic variants of genes, coding regions or portions thereof, and coding and/or regulatory regions modified by nucleotide insertions, deletions, substitutions, and/or inversions in a manner such that the modifications do not substantially interfere with the nucleic acid molecule's ability to encode a polypeptide or to form stable hybrids under stringent conditions with natural gene isolates. An isolated nucleic acid molecule can include degeneracies. As used herein, nucleotide degeneracy refers to the phenomenon that one amino acid can be encoded by different nucleotide codons. Thus, the nucleic acid sequence of a nucleic acid molecule that encodes a protein or polypeptide can vary due to degeneracies.

Unless so specified, a nucleic acid molecule is not required to encode a protein having protein activity. A nucleic acid molecule can encode a truncated, mutated or inactive protein, for example. In addition, nucleic acid molecules may also be useful as probes and primers for the identification, isolation and/or purification of other nucleic acid molecules, independent of a protein-encoding function.

Suitable nucleic acids include fragments or variants that encode a functional cellulase. For example, a fragment can comprise the minimum nucleotides required to encode a functional cellulase. Nucleic acid variants include nucleic acids with one or more nucleotide additions, deletions, substitutions, including transitions and transversions, insertion, or modifications (e.g., via RNA or DNA analogs). Alterations may occur at the 5′ or 3′ terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among the nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.

In certain embodiments, a nucleic acid may be identical to a sequence represented herein. In other embodiments, the nucleic acids may be at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical to a sequence represented herein, or 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical to a sequence represented herein. Sequence identity calculations can be performed using computer programs, hybridization methods, or calculations. Exemplary computer program methods to determine identity and similarity between two sequences include, but are not limited to, the GCG program package, BLASTN, BLASTX, TBLASTX, and FASTA. The BLAST programs are publicly available from NCBI and other sources. For example, nucleotide sequence identity can be determined by comparing query sequences to sequences in publicly available sequence databases (NCBI) using the BLASTN2 algorithm.

Embodiments of the nucleic acids include those that encode a chimeric CbhA polypeptide that functions as a cellulase or functional equivalents thereof. The amino acid sequence of an exemplary chimeric CbhA polypeptide is depicted in FIG. 5 and represented by SEQ ID NO:6. A functional equivalent includes fragments or variants of these that exhibit the ability to function as a cellulase. As a result of the degeneracy of the genetic code, many nucleic acid sequences can encode a polypeptide having, for example, the amino acid sequence of SEQ ID NO:6. Such functionally equivalent variants are contemplated herein.

Altered or variant nucleic acids can be produced by one of skill in the art using the sequence data illustrated herein and standard techniques known in the art. Variant nucleic acids may be detected and isolated by hybridization under high stringency conditions or moderate stringency conditions, for example, which are chosen to prevent hybridization of nucleic acids having non-complementary sequences. “Stringency conditions” for hybridizations is a term of art that refers to the conditions of temperature and buffer concentration that permit hybridization of a particular nucleic acid to another nucleic acid in which the first nucleic acid may be perfectly complementary to the second, or the first and second may share some degree of complementarity that is less than perfect.

Nucleic acids may be derived from a variety of sources including DNA, cDNA, synthetic DNA, synthetic RNA, or combinations thereof. Such sequences may comprise genomic DNA, which may or may not include naturally occurring introns. Moreover, such genomic DNA may be obtained in association with promoter regions or poly (A) sequences. The sequences, genomic DNA, or cDNA may be obtained in any of several ways. Genomic DNA can be extracted and purified from suitable cells by means well known in the art. Alternatively, mRNA can be isolated from a cell and used to produce cDNA by reverse transcription or other means.

Oligonucleotides that are fragments of the nucleic acid sequences disclosed herein and antisense nucleic acids that are complementary, in whole or in part, to those sequences are contemplated herein. Oligonucleotides may be used as primers or probes or for any other use known in the art. Antisense nucleic acids may be used, for example, to inhibit gene expression when introduced into a cell or for any other use known in the art. Oligonucleotides and antisense nucleic acids can be produced by standard techniques known in the art.

Also disclosed herein are recombinant vectors, including expression vectors, containing nucleic acids encoding chimeric CbhA polypeptides. A “recombinant vector” is a nucleic acid molecule that is used as a tool for manipulating a nucleic acid sequence of choice or for introducing such a nucleic acid sequence into a host cell. A recombinant vector may be suitable for use in cloning, sequencing, or otherwise manipulating the nucleic acid sequence of choice, such as by expressing or delivering the nucleic acid sequence of choice into a host cell to form a recombinant cell. Such a vector typically contains heterologous nucleic acid sequences not naturally found adjacent to a nucleic acid sequence of choice, although the vector can also contain regulatory nucleic acid sequences (e.g., promoters, untranslated regions) that are naturally found adjacent to the nucleic acid sequences of choice or that are useful for expression of the nucleic acid molecules.

A recombinant vector can be either RNA or DNA, either prokaryotic or eukaryotic, and typically is a plasmid. The vector can be maintained as an extrachromosomal element (e.g., a plasmid) or it can be integrated into the chromosome of a recombinant host cell. The entire vector can remain in place within a host cell, or under certain conditions, the plasmid DNA can be deleted, leaving behind the nucleic acid molecule of choice. An integrated nucleic acid molecule can be under chromosomal promoter control, under native or plasmid promoter control, or under a combination of several promoter controls. Single or multiple copies of the nucleic acid molecule can be integrated into the chromosome. A recombinant vector can contain at least one selectable marker.

The term “expression vector” refers to a recombinant vector that is capable of directing the expression of a nucleic acid sequence that has been cloned into it after insertion into a host cell or other (e.g., cell-free) expression system. A nucleic acid sequence is “expressed” when it is transcribed to yield an mRNA sequence. In most cases, this transcript will be translated to yield an amino acid sequence. The cloned gene is usually placed under the control of (i.e., operably linked to) an expression control sequence. The phrase “operatively linked” refers to linking a nucleic acid molecule to an expression control sequence in a manner such that the molecule can be expressed when introduced (i.e., transformed, transduced, transfected, conjugated or conduced) into a host cell.

Recombinant vectors and expression vectors may contain one or more regulatory sequences or expression control sequences. Regulatory sequences broadly encompass expression control sequences (e.g., transcription control sequences or translation control sequences), as well as sequences that allow for vector replication in a host cell. Transcription control sequences are sequences that control the initiation, elongation, or termination of transcription. Suitable regulatory sequences include any sequence that can function in a host cell or organism into which the recombinant nucleic acid molecule is to be introduced, including those that control transcription initiation, such as promoter, enhancer, terminator, operator and repressor sequences. Additional regulatory sequences include translation regulatory sequences, origins of replication, and other regulatory sequences that are compatible with the recombinant cell. The expression vectors may contain elements that allow for constitutive expression or inducible expression of the protein or proteins of interest. Numerous inducible and constitutive expression systems are known in the art.

Typically, an expression vector includes at least one nucleic acid molecule encoding a chimeric CbhA polypeptide operatively linked to one or more expression control sequences (e.g., transcription control sequences or translation control sequences). In one aspect, an expression vector may comprise a nucleic acid encoding a chimeric CbhA polypeptide, as described herein, operably linked to at least one regulatory sequence. It should be understood that the design of the expression vector may depend on such factors as the choice of the host cell to be transformed and/or the type of polypeptide to be expressed.

Expression and recombinant vectors may contain a selectable marker, a gene encoding a protein necessary for survival or growth of a host cell transformed with the vector. The presence of this gene allows growth of only those host cells that express the vector when grown in the appropriate selective media. Typical selection genes encode proteins that confer resistance to antibiotics or other toxic substances, complement auxotrophic deficiencies, or supply critical nutrients not available from a particular media. Markers may be an inducible or non-inducible gene and will generally allow for positive selection. Non-limiting examples of selectable markers include the ampicillin resistance marker (i.e., beta-lactamase), tetracycline resistance marker, neomycin/kanamycin resistance marker (i.e., neomycin phosphotransferase), dihydrofolate reductase, glutamine synthetase, and the like. The choice of the proper selectable marker will depend on the host cell, and appropriate markers for different hosts as understood by those of skill in the art.

Suitable expression vectors may include (or may be derived from) plasmid vectors that are well known in the art, such as those commonly available from commercial sources. Examples include the pET expression vectors. Vectors can contain one or more replication and inheritance systems for cloning or expression, one or more markers for selection in the host, and one or more expression cassettes. The inserted coding sequences can be synthesized by standard methods, isolated from natural sources, or prepared as hybrids. Ligation of the coding sequences to transcriptional regulatory elements or to other amino acid encoding sequences can be carried out using established methods. A large number of vectors, including bacterial, fungal, yeast, and mammalian vectors, have been described for replication and/or expression in various host cells or cell-free systems, and may be used with the secretion sequences described herein for simple cloning or protein expression.

Certain embodiments may employ bacterial promoters or regulatory elements. Examples include the arabinose inducible araBAD promoter (pBAD), the lac promoter, the rhamnose inducible rhaP BAD promoter, the T7 RNA polymerase promoter, the trc and tac promoter, the lambda phage promoter p L, and the anhydrotetracycline-inducible tetA promoter/operator. The efficiency of expression may be enhanced by the inclusion of enhancers that are appropriate for the particular bacterial or fungal cell system that is used, such as those described in the literature.

It will be appreciated by one skilled in the art that use of recombinant DNA technologies can improve control of expression of transformed nucleic acid molecules by manipulating, for example, the number of copies of the nucleic acid molecules within the host cell, the efficiency with which those nucleic acid molecules are transcribed, the efficiency with which the resultant transcripts are translated, and the efficiency of post-translational modifications. Additionally, the promoter sequence might be genetically engineered to improve the level of expression as compared to the native promoter. Recombinant techniques useful for controlling the expression of nucleic acid molecules include, but are not limited to, integration of the nucleic acid molecules into one or more host cell chromosomes, addition of vector stability sequences to plasmids, substitutions or modifications of transcription control signals (e.g., promoters, operators, enhancers), substitutions or modifications of translational control signals (e.g., ribosome binding sites), modification of nucleic acid molecules to correspond to the codon usage of the host cell, and deletion of sequences that destabilize transcripts.

The nucleic acids, including parts or all of expression vectors, may be isolated directly from cells, or, alternatively, the polymerase chain reaction (PCR) method can be used to produce the nucleic acids. Primers used for PCR can be synthesized using the sequence information provided herein and can further be designed to introduce appropriate new restriction sites, if desirable, to facilitate incorporation into a given vector for recombinant expression. The nucleic acids can be produced in large quantities by replication in a suitable host cell (e.g., prokaryotic or eukaryotic cells such as bacteria, fungi, yeast, insect or mammalian cells). The production and purification of nucleic acids are described, for example, in Sambrook et al., 1989; F. M. Ausubel et al., 1992, Current Protocols in Molecular Biology, J. Wiley and Sons, New York, N.Y.

The nucleic acids described herein may be used in methods for production of chimeric CbhA polypeptides through incorporation into cells, tissues, or organisms. In some embodiments, a nucleic acid may be incorporated into a vector for expression in suitable host cells. The vector may then be introduced into one or more host cells by any method known in the art. One method to produce an encoded protein includes transforming a host cell with one or more recombinant nucleic acids (such as expression vectors) to form a recombinant cell. The term “transformation” is generally used herein to refer to any method by which an exogenous nucleic acid molecule (i.e., a recombinant nucleic acid molecule) can be inserted into a cell, but can be used interchangeably with the term “transfection.”

Non-limiting examples of suitable host cells include cells from microorganisms such as bacteria, yeast, fungi, and filamentous fungi. Exemplary microorganisms include, but are not limited to, bacteria such as strains of Bacillus brevis, Bacillus megaterium, Bacillus subtilis, Caulobacter crescentus, and Escherichia coli (e.g., BL21 and K12); filamentous fungi from the genera Trichoderma (e.g., T. reesei, T. viride, T. koningii, or T. harzianum), Penicillium (e.g., P. funiculosum), Humicola (e.g., H. insolens), Chrysosporium (e.g., C. lucknowense), Gliocladium, Aspergillus (e.g., A. niger, A. nidulans, A. awamori, or A. aculeatus), Fusarium, Neurospora, Hypocrea (e.g., H. jecorina), and Emericella; and yeasts from the genera Saccharomyces (e.g., S. cerevisiae), Pichia (e.g., P. pastoris), or Kluyveromyces (e.g., K. lactis). Cells from plants such as Arabidopsis, barley, citrus, cotton, maize, poplar, rice, soybean, sugarcane, wheat, switch grass, alfalfa, miscanthus, and trees such as hardwoods and softwoods are also contemplated herein as host cells.

Host cells can be transformed, transfected, or infected as appropriate by any suitable method including electroporation, calcium chloride-, lithium chloride-, lithium acetate/polyene glycol-, calcium phosphate-, DEAE-dextran-, liposome-mediated DNA uptake, spheroplasting, injection, microinjection, microprojectile bombardment, phage infection, viral infection, or other established methods. Alternatively, vectors containing the nucleic acids of interest can be transcribed in vitro, and the resulting RNA introduced into the host cell by well-known methods, for example, by injection. Exemplary embodiments include a host cell or population of cells expressing one or more nucleic acid molecules or expression vectors described herein (for example, a genetically modified microorganism). The cells into which nucleic acids have been introduced as described above also include the progeny of such cells.

Vectors may be introduced into host cells such as those from bacteria by direct transformation, in which DNA is mixed with the cells and taken up without any additional manipulation, by conjugation, electroporation, or other means known in the art. Expression vectors may be expressed by bacteria or other host cells episomally or the gene of interest may be inserted into the chromosome of the host cell to produce cells that stably express the gene with or without the need for selective pressure. For example, expression cassettes may be targeted to neutral chromosomal sites by recombination.

Host cells carrying an expression vector (i.e., transformants or clones) may be selected using markers depending on the mode of the vector construction. The marker may be on the same or a different DNA molecule. In prokaryotic hosts, the transformant may be selected, for example, by resistance to ampicillin, tetracycline or other antibiotics. Production of a particular product based on temperature sensitivity may also serve as an appropriate marker.

Host cells may be cultured in an appropriate fermentation medium. An appropriate, or effective, fermentation medium refers to any medium in which a host cell, including a genetically modified microorganism, when cultured, is capable of growing or expressing the chimeric polypeptides described herein. Such a medium is typically an aqueous medium comprising assimilable carbon, nitrogen and phosphate sources, but can also include appropriate salts, minerals, metals and other nutrients. Microorganisms and other cells can be cultured in conventional fermentation bioreactors and by any fermentation process, including batch, fed-batch, cell recycle, and continuous fermentation. The pH of the fermentation medium is regulated to a pH suitable for growth of the particular organism. Culture media and conditions for various host cells are known in the art. A wide range of media for culturing bacteria, for example, are available from ATCC. Exemplary culture/fermentation conditions and reagents are provided in the Examples that follow.

The nucleic acid molecules described herein encode chimeric CbhA polypeptides with amino acid sequences such as that represented by SEQ ID NO:6. As used herein, the terms “protein” and “polypeptide” are synonymous. “Peptides” are defined as fragments or portions of polypeptides, preferably fragments or portions having at least one functional activity as the complete polypeptide sequence. “Isolated” proteins or polypeptides are proteins or polypeptides purified to a state beyond that in which they exist in cells. In certain embodiments, they may be at least 10% pure; in others, they may be substantially purified to 80% or 90% purity or greater. Isolated proteins or polypeptides include essentially pure proteins or polypeptides, proteins or polypeptides produced by chemical synthesis or by combinations of biological and chemical methods, and recombinant proteins or polypeptides that are isolated. Proteins or polypeptides referred to herein as “recombinant” are proteins or polypeptides produced by the expression of recombinant nucleic acids.

Proteins or polypeptides encoded by nucleic acids as well as functional portions or variants thereof are also described herein. Polypeptide sequences may be identical to the amino acid sequence of SEQ ID NO:6, or may include up to a certain integer number of amino acid alterations. Such protein or polypeptide variants retain functionality as cellulases, and include mutants differing by the addition, deletion or substitution of one or more amino acid residues, or modified polypeptides and mutants comprising one or more modified residues. The variant may have one or more conservative changes, wherein a substituted amino acid has similar structural or chemical properties (e.g., replacement of leucine with isoleucine). Alterations may occur at the amino- or carboxy-terminal positions of the reference polypeptide sequence or anywhere between those terminal positions, interspersed either individually among the amino acids in the reference sequence or in one or more contiguous groups within the reference sequence.

In certain embodiments, the polypeptides may be at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical to SEQ ID NO:6 and possess cellulase function. Percent sequence identity can be calculated using computer programs (such as the BLASTP and TBLASTN programs publicly available from NCBI and other sources) or direct sequence comparison. Polypeptide variants can be produced using techniques known in the art including direct modifications to isolated polypeptides, direct synthesis, or modifications to the nucleic acid sequence encoding the polypeptide using, for example, recombinant DNA techniques.

Modified polypeptides, including those with post-translational modifications, are also contemplated herein. Isolated polypeptides may be modified by, for example, phosphorylation, methylation, farnesylation, carboxymethylation, geranyl geranylation, glycosylation, acetylation, myristoylation, prenylation, palmitation, amidation, sulfation, acylation, or other protein modifications. They may also be modified with a label capable of providing a detectable signal, either directly or indirectly, including, but not limited to, radioisotopes and fluorescent compounds. The polypeptides may be useful as antigens for preparing antibodies by standard methods. Monoclonal and polyclonal antibodies that specifically recognize the polypeptides disclosed herein are contemplated.

Chimeric polypeptides may be expressed, isolated and used as stand-alone polypeptides. They may also be fused to one or more additional polypeptides (using, for example, recombinant technology) to create a fusion protein with an additional complete polypeptide or a functional domain of a polypeptide. Suitable fusion segments include segments that can enhance a protein's stability, provide other desirable biological activity, or assist with the purification of the protein (e.g., by affinity chromatography). A suitable fusion segment can be a domain of any size that has the desired function (e.g., imparts increased stability, solubility, action or biological activity; or simplifies purification of a protein).

Chimeric polypeptides may be detected by any assay known in the art to detect a protein of interest. Examples include enzymatic activity assays, detection with specific antibodies (immunoblotting, ELISA, etc.), and other suitable detection techniques.

Chimeric polypeptides may also be isolated or recovered from the media used in host cell cultures or cell-free expression systems. The phrase “recovering the protein” refers to collecting the whole culture medium containing the protein and need not imply additional steps of separation or purification. Proteins can be purified using a variety of standard protein purification techniques, such as affinity chromatography, ion exchange chromatography, filtration, electrophoresis, hydrophobic interaction chromatography, gel filtration chromatography, reverse phase chromatography, concanavalin A chromatography, chromatofocusing, differential solubilization, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, reversed-phase HPLC, or countercurrent distribution. The polypeptide may contain an additional protein or epitope tag that facilitates detection or purification, such as c-myc, haemagglutinin (HA), polyhistidine, GLU-GLU, FLAG-tag, glutathione-S-transferase (GST), green fluorescent protein (GFP), or maltose binding protein (MBP). Such tags may be removed following the recovery of the polypeptide.

Polypeptides may be retrieved, obtained, or used in “substantially pure” form, a purity that allows for the effective use of the protein in any method described herein or known in the art. For a protein to be most useful in any of the methods described herein or in any method utilizing enzymes of the types described herein, it is most often substantially free of contaminants, other proteins and/or chemicals that might interfere or that would interfere with its use in the method (e.g., that might interfere with enzyme activity), or that at least would be undesirable for inclusion with a protein.

Methods for degrading cellulose and materials containing cellulose using the chimeric CbhA polypeptides are also provided herein. For example, the chimeric CbhA polypeptides may be used in compositions to help degrade (e.g., by liquefaction) a variety of cellulose products (e.g., paper, cotton, etc.) in landfills. The chimeric CbhA polypeptides may also be used to enhance the cleaning ability of detergents, function as a softening agent or improve the feel of cotton fabrics (e.g., stone washing or biopolishing) or in feed compositions.

Cellulose containing materials may also be degraded to sugars using the chimeric CbhA polypeptides. Ethanol may be subsequently produced from the fermentation of sugars derived from the cellulosic materials. Exemplary cellulose-containing materials include bioenergy crops, agricultural residues, municipal solid waste, industrial solid waste, sludge from paper manufacture, yard waste, wood and forestry waste. Examples of biomass include, but are not limited to, corn grain, corn cobs, crop residues such as corn husks, corn stover, corn fiber, grasses, wheat, wheat straw, barley, barley straw, hay, rice straw, switchgrass, waste paper, sugar cane bagasse, sorghum, soy, components obtained from milling of grains, trees, branches, roots, leaves, wood (e.g., poplar) chips, sawdust, shrubs and bushes, vegetables, fruits, flowers and animal manure.

Biofuels such as ethanol may be produced by saccharification and fermentation of lignocellulosic biomass such as trees, herbaceous plants, municipal solid waste and agricultural and forestry residues. Typically, saccharification is carried out by contacting the lignocellulosic biomass with an enzyme cocktail that includes one or more Family 7 cellulases such as the chimeric polypeptides described herein. Such enzyme cocktails may also contain one or more additional cellulases (e.g., a Family 7 cellulase such as Cel7A from T. reesei), endoglucanases (such as the Family 5 endoglucanase E1 from Acidothermus cellulolyticus) or one or more β-glucosidases (e.g., a β-glucosidase from A. niger) to optimize hydrolysis of the lignocelluloses. Additional suitable endoglucanases include EGI, EGII, EGIII, EGIV, EGV or Cel7B (e.g., Cel7B from T. reesei). Enzyme cocktails may also include accessory enzymes such as hemicellulases, pectinases, oxidative enzymes, and the like.

Enzymes with the ability to degrade carbohydrate-containing materials, such as cellulases with endoglucanase activity, exoglucanase activity, or β-glucosidase activity, or hemicellulases with endoxylanase activity, exoxylanase activity, or β-xylosidase activity may be included in enzyme cocktails. Examples include enzymes that possess cellobiohydrolase, α-glucosidase, xylanase, β-xylosidase, α-galactosidase, β-galactosidase, α-amylase, glucoamylases, arabinofuranosidase, mannanase, β-mannosidase, pectinase, acetyl xylan esterase, acetyl mannan esterase, ferulic acid esterase, coumaric acid esterase, pectin methyl esterase, laminarinase, xyloglucanase, galactanase, glucoamylase, pectate lyase, chitinase, exo-β-D-glucosaminidase, cellobiose dehydrogenase, ligninase, amylase, glucuronidase, ferulic acid esterase, pectin methyl esterase, arabinase, lipase, glucosidase or glucomannanase activities.

A lignocellulosic biomass or other cellulosic feedstock may be subjected to pretreatment at an elevated temperature in the presence of a dilute acid, concentrated acid or dilute alkali solution for a time sufficient to at least partially hydrolyze the hemicellulose components before adding the enzyme cocktail. Additional suitable pretreatment regimens include ammonia fiber expansion (AFEX), treatment with hot water or steam, or lime pretreatment.

Separate saccharification and fermentation is a process whereby cellulose present in biomass is converted to glucose that is subsequently converted to ethanol by yeast strains. Simultaneous saccharification and fermentation is a process whereby cellulose present in biomass is converted to glucose and, at the same time and in the same reactor, converted into ethanol by yeast strains. Enzyme cocktails may be added to the biomass prior to or at the same time as the addition of a fermentative organism.

The resulting products after cellulase degradation may also be converted to products other than ethanol. Examples include conversion to higher alcohols, hydrocarbons, or other advanced fuels via biological or chemical pathways, or combination thereof.

EXAMPLES Example 1 Cloning, Expression, and Purification

Nucleic acids encoding the proteins or protein domains were synthesized directly or were cloned from C. thermocellum. Cellulase or scaffoldin genes were amplified from the genomic DNA of C. thermocellum (ATCC 27405), using the primers listed in Table 1.

TABLE 1  Gene Primer Nucleotide sequence cloning F-CelG-NheI ACAGCAGCTAGCGCCGTCGACAGCAAC CelG AACG (SEQ ID NO: 9) R-CelG-XhoI TGTTGACTCGAGGGTGGTGTGCGGCAG CelG TTTGTC (SEQ ID NO: 10) F-CelG-XhoI ACAGCACTCGAGGCCGTCGACAGCAAC CelG AAC (SEQ ID NO: 11) F-CelA-NcoI CTGTGTCCATGGCAGGTGTGCCTTTTA CelA ACACA (SEQ ID NO: 12) R-CelA-XhoI CCCATTCTCGAGATAAGGTAGGTGGGG CelA TATGC (SEQ ID NO: 13) F-CelA-XhoI ACTGTGCTCGAGGCAGGTGTGCCTTTT CelA AA (SEQ ID NO: 14) F-CelR-NcoI CTGTTTCCATGGCAGACTATAACTATG CelR GAGAA (SEQ ID NO: 15) R-CelR-XhoI ACGATACTCGAGTGAATTTCCGGGTAT CelR GGTTG (SEQ ID NO: 16) F-CelR-XhoI CCTGTTCTCGAGGCAGACTATAACTAT CelR GGAG (SEQ ID NO: 17) F-CelS-XhoI ACTGCACTCGAGGGTCCTACAAAGGCA CelS CCTA (SEQ ID NO: 18) R-CelS-XhoI  ATCAGTTTTGCTCGAGGTTCTTGTACG CelS GCAATGTAT (SEQ ID NO: 19) F-CelY-XhoI AGCTTTCTCGAGTCCAGACAATCATCC CelY AATTC (SEQ ID NO: 20) R-CelY-XhoI AGTTTCCTCGAGTGAATTGCTGTCATC CelY AGAGT (SEQ ID NO: 21) F-CbhA-NdeI TCCGTGCATATGTTAGAAGATAATTCT CbhA TCGACT (SEQ ID NO: 22) R-CbhA-XhoI CAGATTCTCGAGTCGATATGGCAATTC CbhA TTCTAT (SEQ ID NO: 23)

Genes encoding cellulases, monoscaffoldins and multifunctional cellulases were overexpressed in the BL21(DE3) strain of E. coli (Stratagene, La Jolla, Calif.)) in the presence of 0.3 mM IPTG at either 16° C. or 37° C. Recombinant proteins were purified in His-tagged form by nickel-nitrilotriacetic acid (Ni-NTA) affinity chromatography (Qiagen, Valencia, Calif.). Table 2 illustrates chimeric proteins made and their module structure.

TABLE 2 Gene or gene components Module structure CelG GH5-Doc CelA GH8-Doc CelR GH9-CBM3c-Doc CbhA^(a)-linker1-doc CBM4-Ig-GH9-linker1-Doc CbhA^(a)-linker3-doc CBM4-Ig-GH9-linker3-Doc CbhA^(a)-linker1-CelG CBM4-Ig-GH9-linker1-GH5-Doc CbhA^(a)-linker2-CelG CBM4-Ig-GH9-linkler2-GH5-Doc CbhA^(a)-linker3-CelG CBM4-Ig-GH9-linker3-GH5-Doc CbhA^(a)-linker1-CelA CBM4-Ig-GH9-linker1-GH8-Doc CbhA^(a)-linker2-CelA CBM4-Ig-GH9-linker2-GH8-Doc CbhA^(a)-linker3-CelA CBM4-Ig-GH9-linker3-GH8-Doc CbhA^(a)-linker1-CelR CBM4-Ig-GH9-linker1-GH9- CBM3c-Doc CbhA^(a)-linker2-CelR CBM4-Ig-GH9-linker2-GH9- CBM3c-Doc CbhA^(a)-linker3-CelR CBM4-Ig-GH9-linker3-GH9- CBM3c-Doc CbhA^(a)-CelS CBM4-Ig-GH9-GH48-Doc CbhA^(a)-linker3-CelS CBM4-Ig-GH9-linker3-GH48-Doc CbhA^(a)-CelY^(b) CBM4-Ig-GH9-GH48 CbhA^(a)-linker3-CelY^(b) CBM4-Ig-GH9-linker3-GH48 Monoscaffoldin (truncated CipA) Cohesin 2-CBM3a Chimera1 of CbhA CBM4-Ig-GH9-linker1-CBM3b-Doc Chimera2 of CbhA CBM4-Ig-GH9-linker2-CBM3b-Doc CbhA CBM4-Ig-GH9-linker3-CBM3b-Doc GH, glycosyl hydrolase family; CBM, carbohydrate binding module; Coh, type-I cohesin; Doc, type-I dockerin; Ig, immunoglobulin-like fold; ^(a)truncated CbhA (CBM4-Ig-GH9), ^(b)truncated CelY (GH48 module only).

Example 2 Cellulase Activity Assay

Constructs or natural enzyme sequences ending in a C-terminal “doc” (dockerin) were assayed after being mixed with a “binding adaptor” consisting of the fusion of cohesin 2 (the second of the nine Type-I cohesins in the C. thermocellum scaffoldin protein CipA) with a CBM3a module from the same scaffoldin to provide the construct or enzyme with a C-terminal family-3 carbohydrate-binding module which could bind to crystalline cellulose.

Cellulase activity was measured under anaerobic conditions using microcrystalline cellulose (Avicel PH-101, Fluka, Sigma-Aldrich Corp., St. Louis, Mo.) as substrate. Enzymes were loaded at a standard molar concentration of 2.0 micromoles/L (or 400 μmol/g cellulose), working against a standard substrate (Avicel) loading of 5.0 mg/mL. Assays were carried out at 60° C. in 20 mM acetate, pH 5.0, containing 10 mM CaCl₂, 5.0 mM L-cysteine and 2 mM EDTA to promote stability of the anaerobe-derived cellulases. Each assay mixture also included Aspergillus niger β-glucosidase (chromatographically-purified from the commercial mixture Novozym 188 (Novozymes North America, Franklinton, N.C., USA.)) at a concentration of 0.005 mg/mL (or 1.0 mg/g of cellulose substrate), to maintain cellobiose concentrations below the levels at which cellobiose-inhibition of the enzymes is measurable.

Assays were carried out in triplicate, in initial digestion volumes of 1.0 mL in crimp-sealed 2.0 mL HPLC vials, with constant mixing by inversion at 10/min in a rotating incubator inside a glove box maintaining an atmosphere of 5% hydrogen, 95% nitrogen. At designated times during the digestions, representative 0.1 mL aliquots of liquid and solids were withdrawn for analysis, with the digestion vials being opened and then re-capped anaerobically inside the glove-box. The withdrawn aliquots of digestion mixture were diluted 18-fold with deionized water in sealed 2.0 mL HPLC vials, which were then immersed for 10 minutes in a boiling water bath to terminate the enzyme reactions. The diluted digestion-mixture aliquots were then syringe-filtered (0.2 μm) before quantification of released sugars by HPLC. HPLC sugar analyses were carried out on a Bio-Rad (Hercules, Calif.) HPX-87H column operated at 65° C. with 0.01 N H₂SO₄ (0.6 mL/min) as mobile phase in an Agilent (Santa Clara, Calif.) 1100-series liquid chromatograph with refractive-index detection.

Example 3 Linkers for Construction of Chimeras

As shown in FIG. 6, linkers from the following sources were used: Clostridium thermocellum Orf2P (linked), Caldicellulosiruptor bescii CelA (linker2), and two consecutive X1 domains from wild-type Clostridium thermocellum CbhA (linker3) (FIG. 6). These three linkers have amino-acid compositions characterized as PT (proline/threonine-rich, linker2), or “PT and G” (proline/threonine-rich with additional glycine-rich regions, as in linker1) and as “generic” with no unique compositional features (Linker3).

Example 4 Chimeric Enzymes and Activities

Two CbhA chimeras in which both of the consecutive X1 domains of CbhA have been replaced were made. In one case, the domains were replaced by a linker (linker1) isolated from the C. thermocellum scaffoldin gene Orf2P to form Chimera1 (CBM4-Ig-GH9-linker1-CBM3b-Doc), and in the other case by a linker (linker2) from Caldicellulosiruptor bescii CelA (Chimera2, CBM4-Ig-GH9-linker 2-CBM3b-Doc).

Activities against crystalline cellulose of wild type CbhA and its two chimeras were measured in two different experimental setups. In one method, the three constructs were assayed in “bare-dockerin” form (i.e., no cohesin-CBM3a construct was added to augment the chimera with a second CBM3 at the C-terminal). In the second experimental approach, each dockerin-bearing construct was mixed before assay, with a monoscaffoldin binding-adaptor formed by fusing cohesin2 and CBM3a of C. thermocellum cipA.

Progress curves for saccharification of Avicel by the bare-dockerin CbhA are shown in FIG. 8A. At the end of a 119 hour digestion, Chimera2, with the PT-composition CelA linker, had solubilized 75% as much of the Avicel as had the construct retaining the generic, wild-type linker; chimera1, while the PT & G Orf2P linker, solubilized 62% as much Avicel as did the wild-type construct.

The addition of a Family-3a CBM to the C-terminus of each of the above three constructs results in striking differences in their relative activities against Avicel. All three of the constructs have their activities boosted by addition of the CBM3a, but the two constructs with non-native linker domains are helped more than is the construct that has the wild-type repeated X1 domains. Based on yields of soluble sugar after 119 hour digestion, the activities of the constructs with linker1 and 2 are increased by factors of 1.55 and 1.73, with respect to their yields without the Coh2-CBM3 adduct, whereas the 119 hour yield for the wild-type (linker3) construct is increased by a factor of only 1.09. As a result of the greater enhancement of activity of the “linker2” and “linker1” constructs, the 119 hour yield of the linker2 construct is now 1.2 times that of the wild-type-linker construct, and the yield of the digestion by the linker1 construct has pulled up to 0.88 times that of the wild-type (from 0.62 times wild-type without the Coh2-CBM3a binding adaptor).

Example 5 Influence of Linkers on Enzyme Activities

The activities of multimodular cellulase peptides incorporating more than one catalytic domain may depend not only upon the types of activities being combined, but also upon the ordering of the catalytic domains in the peptide, and upon the properties of the linker segments used to connect them. The importance of linker-segment properties is illustrated in FIGS. 9 and 10.

FIG. 9 compares the Avicelase activities of two engineered multifunctional (multi-catalytic) cellulases each containing, in the same order, a truncated C. thermocellum CbhA (N-terminal CBM4 through the GH9 catalytic module) connected at its C-terminus to another cellulosomal catalytic module (GH48, CelS). The difference between the two constructs is in the way in which the two catalytic domains are connected. The upper curve in FIG. 9 (for the more active construct) has the two catalytic domains connected through the two X1 sequences found C-terminal to the GH9 domain in CbhA; the lower curve shows saccharification by a construct in which the GH48 domain was connected directly to the GH9 domain, without any special linker domain. The construct lacking the special linker segment that may be required to provide the proper spacing and/or flexibility to allow both catalytic domains to engage the substrate effectively, is seen to covert 34% less of the substrate in 125 h than is converted by the construct with the wild-type CbhA double-X1 domains (Linker 3).

In a manner similar to that used for the constructs in FIG. 9, the same C-terminal portion of C. thermocellum CbhA was connected, in one case directly and, in the other, through an intervening linker 3 (double X1) segment, to another C. thermocellum GH48 catalytic domain, this time from the non-cellulosomal CelY. The resulting difference in activity (FIG. 10), although not as dramatic as seen in FIG. 9, appears statistically significant given the relatively small standard errors of the triplicate determinations and is in the same direction, i.e., the construct with the intervening linker solubilizes more of the cellulose in a 119 hour digestion than does the construct with the two catalytic domains linked directly.

In a systematic study aimed at further elucidating the contributions of linker-segment properties (and of combinations of catalytic domains) to multifunctional cellulase activity, a total of nine multifunctional cellulase genes were designed to test a 3×3 matrix in which each of three C. thermocellum catalytic domains was in turn connected to the C-terminus of the truncated CbhA described earlier, by means of each of three compositionally different linker sequences (FIG. 6). Catalytic modules representing glycohydrolase families 5, 8, and 9 were cloned from the genes for CelG, CelA, and CelR respectively. Genes for other modules, such as those for CBM3c and 4, and for the Ig-like domains and dockerin-1 were those contiguous to the targeted catalytic domains in the genome and were obtained along with the catalytic domains as single gene segments. Constructs shown in Table 3 were built using these gene segments, cloned into E. coli, overexpressed and purified.

TABLE 3 Conversion Gene component Module structure of Avicel (%) CbhA^(a)-linker1-CelG CBM4-Ig-GH9-linker1-GH5-Doc 48.9 ± 0.77 CbhA^(a)-linker2-CelG CBM4-Ig-GH9-linkler2-GH5-Doc 57.5 ± 0.92 CbhA^(a)-linker3-CelG CBM4-Ig-GH9-linker3-GH5-Doc 58.6 ± 1.31 CbhA^(a)-linker1-CelA CBM4-Ig-GH9-linker1-GH8-Doc 41.5 ± 0.10 CbhA^(a)-linker2-CelA CBM4-Ig-GH9-linker2-GH8-Doc 47.6 ± 0.67 CbhA^(a)-linker3-CelA CBM4-Ig-GH9-linker3-GH8-Doc 50.2 ± 0.29 CbhA^(a)-linker1-CelR CBM4-Ig-GH9-linker1-GH9- 51.5 ± 0.28 CBM3c-Doc CbhA^(a)-linker2-CelR CBM4-Ig-GH9-linker2-GH9- 45.6 ± 0.15 CBM3c-Doc CbhA^(a)-linker3-CelR CBM4-Ig-GH9-linker3-GH9- 40.6 ± 0.39 CBM3c-Doc

The activities of additional artificial multifunctional cellulases have been tested, in a 3×3 matrix of three different pairs of catalytic domains and three different linker sequences. The results of assays against Avicel PH101 (Table 3) showed that the effectiveness of a given linker in construction of active multifunctional cellulases is dependent upon the identity of the catalytic (and other) domains being connected. In the case of pairing of the N-terminal sequence CBM4-Ig-GH9 with the C-terminal sequence GH5-Doc, the multifunctional enzyme constructed using Linker3 released almost 20% more soluble sugar in a 70.3 hour digestion than did the same two sequences connected by Linker1 (58.6% of potential soluble sugar for Linker3 vs. 48.9% for Linker1). In contrast, however, when the C-terminal “catalytic” sequence was changed to GH9-CBM3c-Doc (with the same N-terminal sequence as before), the apparent effectiveness of the two linkers was reversed, with the multifunctional connected by Linker1 releasing almost 27% more soluble sugar than did the Linker3 construct. Broader trends in this data set may be seen in the orthogonal array presentation of Table 4. CelG and CelA prefer the linkers in the order Linker3>Linker2>Linker1. CelR reverses this preference, with activities decreasing in the order Linker1>Linker2>Linker3. From the viewpoint of the linkers, Linker3 and Linker2 both prefer the C-terminal catalytic domains in the order CelG>CelA>CelR. Linker1 departs from this trend—the variation of yield with C-terminal catalytic domain is not monotonic. Yield is less with CelA than with CelG as C-terminal catalytic domain, as is the case with Linker2 and Linker3, but linking CelR to truncated CbhA through Linker1 results in a construct more active than either the corresponding CelA OR CelG Linker1 constructs.

TABLE 4 CelG (GH5) CelA(GH8) CelR(GH9) Linker 1 48.9 41.5 51.5 Linker2 57.5 47.6 45.6 Linker3 58.6 50.2 40.6

These results show that choice of linkers is important for the activity of multifunctional cellulases, and that the contributions of linkers to activity of multifunctional cellulases are dependent on the combinations of catalytic modules.

Example 6 Intra-Molecular Synergy

In order to investigate the intra-molecular synergy resulting from combining catalytic domains into multifunctional cellulases, Avicelase activities of some of the artificial multifunctional cellulases listed in Table 3 were compared with the activities of their component modules assayed as simple mixtures, rather than as covalently-linked multifunctionals (Table 5). Out of five multifunctional cellulases evaluated in this way, four showed significant intra-molecular synergism, i.e., CBM4-Ig-GH9-linker1-GH5-Doc, CBM4-Ig-GH9-linker3-GH5-Doc, CBM4-Ig-GH9-linker1-GH8-Doc, CBM4-Ig-GH9-linker1-GH9-CBM3c-Doc displayed intramolecular synergism of 1.12, 1.27, 1.53 and 1.48, respectively, when compared with simple mixtures of their component segments, assayed at the same molar loadings. One of the multifunctional cellulases, CBM4-Ig-GH9-linker3-GH9-CBM3c-Doc did not show significant synergy, but neither did it show any reduction of activity relative to that of a simple mixture of the two parent individual cellulases. The highest observed intra-molecular synergism was 1.53, demonstrating that construction of multifunctional cellulases is a practical approach to improving cellulase activity.

Traditional synergism factors/ratios are also provided in Table 5, comparing the totals of sugar release by the N-terminal and C-terminal segments of each construct when assayed separately with sugar release by a simple mixture of the two, operating in the same assay vial. The “endo-exo” synergism for simple mixtures of some individual cellulases, such as the mixture of GH9-CBM3c-Doc and CBM4-GH9-Ig-linker1-Doc (0.73) was not high, but their intra-molecular synergy of CBM4-GH9-Ig-linker1-GH9-CBM3c-Doc could reach a considerably higher level (1.48) upon connection of the two by an appropriate linker. A value less than unity for the synergism ratio for the simple mixture does not necessarily indicate negative synergism, or interference, but it does hint at relatively weak or even negligible synergism, making more impressive the synergism arising from linking the domains. This demonstrates that construction of artificial multifunctional cellulases is a valid approach to improving the activity of cellulases.

In combinations of two catalytic modules and linkers, it is difficult to find a general rule for design of multifunctional cellulases based on current results. For example, in the architecture of CBM4-Ig-GH9-linker1-GH5-Doc and CBM4-Ig-GH9-linker3-GH5-Doc, intra-molecular synergism caused by linker3 is better than that of linker1, however, in CBM4-Ig-GH9-linker1-GH9-CBM3c-Doc and CBM4-Ig-GH9-linker3-GH9-CBM3c-Doc, linker 1 is better. The effect of a given linker appears to be sensitive to the properties of the particular combination of modules being linked.

TABLE 5 Endo- Intra- Conversion exo molecular Gene components Module structure of Avicel (%) synergy synergism CelG GH5-Doc 25.3 ± 0.72 CbhA^(a)-linker1-doc CBM4-Ig-GH9-linker1-Doc 13.6 ± 0.24 Mixture of CelG and Mixture of GH5-Doc and CBM4- 43.7 ± 0.32 1.11 CbhA^(a)-linker1-doc Ig-GH9-linker1-Doc CbhA^(a) -linker1-CelG CBM4-Ig-GH9-linker1-GH5-Doc 48.9 ± 0.77 1.12 CelG GH5-Doc 25.3 ± 0.72 CbhA^(a)-linker3-doc CBM4-Ig-GH9-linker3-Doc 25.3 ± 0.29 Mixture of CelG and Mixture of GH5-Doc and CBM4- 46.2 ± 1.25 0.91 CbhA^(a)-linker3-doc Ig-GH9-linker3-Doc CbhA^(a) -linker3-CelG CBM4-Ig-GH9-linker3-GH5-Doc 58.6 ± 1.31 1.27 CelA GH8-Doc 13.0 ± 0.11 CbhA^(a)-linker1-doc CBM4-Ig-GH9-linker1-Doc 13.6 ± 0.24 Mixture of CelA and Mixture of GH8-Doc and CBM4- 27.2 ± 0.37 1.02 CbhA^(a)-linker 1-doc Ig-GH9-linker1-Doc CbhA^(a) -linker1-CelA CBM4-Ig-GH9-linker1-GH8-Doc 41.5 ± 0.10 1.53 CelR GH9-CBM3c-Doc 30.6 ± 0.33 CbhA^(a)-linker1-doc CBM4-Ig-GH9-linker1-Doc 13.6 ± 0.24 Mixture of CelR and Mixture of GH9-CBM3c-Doc and 34.9 ± 0.44 0.73 CbhA^(a)-linker1-doc CBM4-Ig-GH9-linker1-Doc CbhA^(a) -linker1-CelR CBM4-Ig-GH9-linker1-GH9- 51.5 ± 0.28 1.48 CBM3c-Doc CelR GH9-CBM3c-Doc 30.6 ± 0.33 CbhA^(a) -linker3-Doc CBM4-Ig-GH9-linker3-Doc 25.3 ± 0.29 Mixture of CelR and Mixture of GH9-CBM3c-Doc and 39.4 ± 0.11 0.70 CbhA^(a) -linker3-Doc CBM4-Ig-GH9-linker3-Doc CbhA^(a) -linker3-CelR CBM4-Ig-GH9-linker3-GH9- 40.6 ± 0.39 1.03 CBM3c-Doc

Example 7 Time Course Activities of Chimeric Enzymes

FIG. 11 displays progress-curve data for three of the multifunctional cellulases whose activities are described in Table 5, along with curves for their respective single-catalytic-domain constituents. The pattern observed in the overall figure is the difference in the shapes of the curves. The GH9-linker1-GH8 construct, which, while exhibiting the greatest degree of intramolecular synergism actually delivers the smallest ultimate (70.3 hour) conversion (FIG. 11C), is relatively quick out of the blocks, delivering 86.4% of its final (70.3 hour) conversion in the first 14.5 hours of the digestion. The other two multifunctional constructs, GH9-linker3-GH5 (FIG. 11A) and GH9-Linker1-GH5 peptides, reach 74.4% and 67.3%, respectively, of their 70.3 hour conversions in the first 14.5 hours. Similar patterns are observed in comparing the respective curves for the simple mixture of the GH8-containing monofunctional component peptides, and the GH8 component by itself, with their GH5 counterparts.

The Examples discussed above are provided for purposes of illustration and are not intended to be limiting. Still other embodiments and modifications are also contemplated.

While a number of exemplary aspects and embodiments have been discussed above, those of skill in the art will recognize certain modifications, permutations, additions and sub combinations thereof. It is therefore intended that the following appended claims and claims hereafter introduced are interpreted to include all such modifications, permutations, additions and sub-combinations as are within their true spirit and scope. 

We claim:
 1. An isolated nucleic acid molecule encoding a chimeric CbhA polypeptide comprising domains from Clostridium thermocellum cellobiohydrolase A (CbhA) and Caldicellulosiruptor bescii cellobiohydrolase A (CelA) polypeptides, wherein the chimeric CbhA polypeptide has an amino acid sequence at least 95% identical to “SEQ ID NO:6,” and wherein the chimeric CbhA polypeptide has a cellulase activity at least 1.5-fold greater than the wild-type CbhA polypeptide.
 2. The isolated nucleic acid molecule of claim 1, wherein the chimeric CbhA polypeptide comprises the linker domain from the Caldicellulosiruptor bescii CelA polypeptide.
 3. The isolated nucleic acid molecule of claim 1, wherein the chimeric CbhA polypeptide has a cellulase activity at least 2-fold greater than the wild-type CbhA polypeptide.
 4. The isolated nucleic acid molecule of claim 1, wherein the chimeric CbhA polypeptide has an amino acid sequence at least 97% identical to SEQ ID NO:6.
 5. The isolated nucleic acid molecule of claim 1, wherein the chimeric CbhA polypeptide has the amino acid sequence of SEQ ID NO:6.
 6. The isolated nucleic acid molecule of claim 1, further comprising a promoter operably linked to the nucleic acid molecule.
 7. The isolated nucleic acid molecule of claim 6, wherein the promoter allows expression of the nucleic acid in a bacterial host cell.
 8. An expression vector comprising the nucleic acid molecule of claim
 1. 9. An isolated host cell that expresses a recombinant polypeptide encoded by the nucleic acid molecule of claim
 1. 10. The host cell of claim 9, wherein the cell is an E. coli cell.
 11. An isolated chimeric CbhA polypeptide encoded by the nucleic acid molecule of claim
 1. 12. A method for degrading cellulose or lignocellulosic biomass, comprising contacting a cellulose containing material or lignocellulosic biomass with the isolated chimeric CbhA polypeptide of claim
 11. 13. A method for producing a biofuel from lignocellulosic biomass, comprising: a) contacting the lignocellulosic biomass with an enzyme cocktail comprising the isolated chimeric CbhA polypeptide of claim 11 to generate sugars; and b) converting the sugars to a biofuel by fermentation.
 14. The method of claim 13, wherein the enzyme cocktail further comprises an endoglucanase, a β-glucosidase, or both. 